o 
o 

(N 



On Non-coherent MIMO Channels in the Wideband 
Regime: Capacity and Reliability * 

Siddharth Ray^ Muriel Medard^ Lizhong Zheng^ 



Abstract 



We consider a multiple- input, multiple-output (MIMO) wideband Rayleigh block 
fading channel where the channel state is unknown to both the transmitter and the 
. receiver and there is only an average power constraint on the input. We compute the 

capacity and analyze its dependence on coherence length, number of antennas and re- 
ceive signal-to-noise ratio (SNR) per degree of freedom. We establish conditions on the 
coherence length and number of antennas for the non-coherent channel to have a "near 
Q I coherent" performance in the wideband regime. We also propose a signaling scheme 

that is near-capacity achieving in this regime. 

We compute the error probability for this wideband non-coherent MIMO channel and 
<*' I study its dependence on SNR, number of transmit and receive antennas and coher- 

ence length. We show that error probability decays inversely with coherence length 
■ and exponentially with the product of the number of transmit and receive antennas. 

, Moreover, channel outage dominates error probability in the wideband regime. We also 

\^ I show that the critical as well as cut-off rates are much smaller than channel capacity 

' in this regime. 

1 Introduction 

X 

^ I Recent years have seen the emergence of high data rate, third generation wideband wireless 
communication standards hke wideband code division multiple access (W-CDMA) and Ultra- 
wideband (UWB) radio. Motivated by the ever increasing demand for higher wideband 
wireless data rates, we consider multiple antenna communication over the wideband wireless 
channel. 

At the cost of additional signal processing (which is getting cheaper with rapid advances 
in VLSI technology), multiple-input, multiple-output (MIMO) systems have been known 
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to improve considerably performance of wireless systems in terms of reliability as well as 
throughput, without requiring additional resources such as bandwidth and power. However, 
multiple antenna research has focused primarily in the regime where the received signal-to- 
noise ratio (SNR) per degree of freedom is high. Such a regime operates in essence as a 
narrowband regime. We now study the performance of MIMO at the other extreme, i.e., 
when the available bandwidth is large, which takes us to the regime where the SNR per 
degree of freedom is low. 

In wideband channels, the available power is spread over a large number of degrees of freedom. 
This makes the SNR per degree of freedom low. Hence, while studying these channels, we 
need to focus on the low SNR regime. We will therefore use the terms "wideband" and "low 
SNR" interchangeably, with the understanding that the latter refers to the SNR per degree 
of freedom. 

The study of single antenna wideband channels dates back to 1969 and early work has 
considered the Rayleigh fading channel model. Kennedy [2j shows that the capacity of 
an infinite bandwidth Rayleigh fading channel is the same as that of an infinite bandwidth 
additive white Gaussian noise (AWGN) channel with the same average received power. Using 
the results of Gallager [3 , Telatar j3] obtains the capacity per unit energy for the Rayleigh 
fading channel as a function of bandwidth and signal energy, concluding that given an average 
power constraint, the Rayleigh fading and AWGN channels have the same capacity in the 
limit of infinite bandwidth. Telatar and Tse ^2] show that this property of the channel 
capacity is also found in channels with general fading distributions. 

Medard and Gallager [HI EI establish that very large bandwidths yield poor performance 
for systems that spread the available power uniformly over time and frequency (for example 
DS-CDMA). They express the input process as an orthonormal expansion of basis functions 
localized in time and frequency. The energy and fourth moment of the coefficients scale 
inversely with the bandwidth and square of the bandwidth, respectively. By constraining 
the fourth moment (as is the case when using spread spectrum signals), they show that 
mutual information decays to inversely with increasing bandwidth. Telatar and Tse ^2] 
consider a wideband fading channel to be composed of a number of time-varying paths 
and show that the input signals needed to achieve capacity must be "peaky" in time or 
frequency. They also show that if white-like signals are used (as for example in spread 
spectrum communication), the mutual information is inversely proportional to the number 
of resolvable paths with energy spread out and approaches as the number of paths get 
large. This does not depend on whether the paths are tracked perfectly at the receiver or 
not. A strong coding theorem is obtained for this channel in j22]- Subramanian and Hajek 
[TH] derive similar results as [HI CHI using the theory of capacity per unit cost, for a certain 
fourth-order cost function, called fourthegy. 

We now consider the use of multiple antennas over these channels. MIMO channels were 
first studied from a capacity point of view in [SJ IH] . In a Rayleigh fiat-fading environment 
with perfect channel state information (CSI) at the receiver (coherent channel) but no CSI 
at the transmitter, and statistically independent propagation coefficients between all pairs 
of transmit and receive antennas, the multiple antenna capacity increases linearly with the 
smaller of the number of transmit and receive antennas, provided the signal-to-noise ratio is 
high [5]. 

When the coherence time of the channel is small (for example if the receiver is mobile). 
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communication is desirable without training. Here, CSI is unavailable at the transmitter as 
well as the receiver. This channel is also referred to as the non-coherent channel. In |H|, 
Marzetta and Hochwald derive the structure of the optimal input matrix as a product of two 
statistically independent matrices; one of them being an isotropically distributed unitary 
matrix and the other being a diagonal, real and non-negative matrix. They also show that 
there is no gain, from the point of view of capacity, in having the number of transmit antennas 
be more than the coherence interval (in symbols) of the channel. Zheng and Tse ^3] obtain 
the non-coherent MIMO capacity in the high SNR regime and show that, in this regime, the 
number of transmit antennas required need not be more than half the coherence interval (in 
symbols). 

In this paper, we assume that the transmitter and receiver have no channel state information 
(CSI). Hence, we study the non-coherent channel in this paper. We also assume Rayleigh 
block fading. In the limit of infinite bandwidth, Zheng and Tse fT^ show that the capacities 
per degree of freedom for the coherent and non-coherent MIMO channels are the same, i.e., 

C'coherent(SNR) C 

non— coherent I r\J 

hm — — = lim — — = r, 

SNR-.0 SNR SNR-.0 SNR 

where, r is the number of receive antennas and SNR is the average signal-to-noise ratio per 
degree of freedom at each receive antenna. The capacity can thus be expressed as: 

C(SNR) = rSNR + o(SNR) nats/channel use 

and is thus a linear function only in the limit of low SNR. As SNR increases from 0, capacity 
increases in a sublinear fashion, showing that low SNR communication is power efficient. 
Using a Taylor series expansion, Verdu ^7] shows that the second derivative of the capacity 
at SNR = is finite for the coherent channel. The impact on the coherent capacity of antenna 
correlation, Ricean factors, polarization diversity and out-of-cell interference is considered 
in [21]. For the non-coherent channel, Verdu ^7j shows that though "fiash" signaling is first 
order optimal, it renders the second derivative —oo. Hence, the coherent and non-coherent 
channels have the same linear term and differ in their sublinear term. Therefore, the non- 
coherent channel capacity approaches the wideband limit slower than the coherent channel 
capacity. 

Let us define the sublinear term for the MIMO channel with t transmit and r receive antennas 
as 

A(*'")(SNR) = rSNR - C(SNR) nats/channel use. 

Computing the sublinear term tells us the capacity and also quantifies the convergence of 
the capacity function to the low SNR limit. Larger the order of the sublinear term, faster 
the convergence. Using the results of Verdu J7j, the sublinear term for the Rayleigh fading 
coherent MIMO channel, A[*„'^^^^^^(SNR), is 

A^:'lnt(SNR) = ^^SNR^ + o(SNR^). 

On the other extreme, for the i.i.d Rayleigh fading non-coherent MIMO channel, the sublin- 
ear term, A[*;J(SNR) > 0(SNR2) [HI. In this paper, we compute A.Ii;J(SNR) and show that 
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Figure 1: Sublinear capacity term. 



4 



on-off signaling achieves capacity for the i.i.d Rayleigh fading non-coherent MIMO channel. 
Figure H shows the sublinear terms for the Rayleigh fading coherent channel and the i.i.d. 
Rayleigh fading non-coherent channel with the same number of transmit and receive anten- 
nas. A property of the non-coherent capacity is that it tends towards the coherent capacity 
as the coherence length increases. Hence, the sublinear term for the i.i.d Rayleigh fading 
non-coherent channel is the largest (non-coherent extreme), whereas, for the coherent chan- 
nel, it is the smallest (coherent extreme). In this paper, we focus on how the non-coherent 
MIMO channel capacity is influenced by the coherence length, number of antennas and SNR. 
We do so, by computing the sublinear term, which in turn tells us the capacity of the low 
SNR non-coherent MIMO channel of arbitrary coherence length. Thereby, we sweep the 
region, shown in Figure ^ between the coherent and non-coherent extremes. 
In the low SNR regime, the sublinear term also represents the energy efficiency of communi- 
cation. Let En and A^o represent the energy per information nat and the noise spectral level, 
respectively. We have: 

Er, SNR 



A^o C(SNR) 

SNR 



rSNR- A(t'^)(SNR) 
1 1 



^ , _ A(t.'-)(SNR) 
■'■ rSNR 



Taking logarithms on both sides, 

fEn\ A(*'-)(SNR) 

Equation (P) shows how energy efficiency is related to the sublinear term. The smaller the 
sublinear term for a channel, the more energy efficient will it be. As the non-coherent ca- 
pacity is always less than the coherent capacity for the same number of transmit and receive 
antennas, lack of receiver CSI results in energy inefficiency. Also, note that the minimum 
energy (in dB) required to reliably transmit one information nat decreases logarithmically 
with the number of receive antennas. 

Let us now turn to Figure which shows how wideband capacity changes with bandwidth. 
We denote P as the average receive power and A"o as the noise spectral density, which makes 
the wideband limit nats/sec. We obtain this figure by scaling the y-axis of Figure [T] by 
the bandwidth. Channels whose capacities converge slowly to the wideband limit have to 
incur large bandwidth penalties. For the same number of transmit and receive antennas, the 
non-coherent capacity is less than the coherent capacity. Thus, the non-coherent channel 
requires a larger bandwidth in order to reliably support the same throughput as the coher- 
ent channel. This bandwidth penalty grows with bandwidth. Hence, for the non-coherent 
channel, the closer we get to the wideband limit, we gain in terms of energy efficiency (as the 
sublinear term decreases), but the bandwidth penalty becomes larger. We quantify this effect 
by computing the low SNR non-coherent MIMO capacity. Studying how capacity changes 
with coherence length also tells us the amount of bandwidth required to achieve a "near 
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Figure 2: Wideband capacity. 



coherent" performance. Note that since bandwidth penalty increases with decreasing co- 
herence length, the channel at the non-coherent extreme (i.i.d Rayleigh fading non-coherent 
channel) has to incur the largest bandwidth penalty. 

At low SNR, channel estimates become unreliable. Hence, even for slowly varying channels, 
estimating the channel at the receiver may not be possible. When we have multiple anten- 
nas at the receiver as well as the transmitter, estimation becomes much more difficult since 
there are multiple channel coefficients that need to be estimated. Hence, communication 
is desirable without training. In |2S1, non-coherent communication is considered with the 
input distribution constrained to be exponentially decaying. It is show that the capacity 
per degree of freedom in the low SNR regime is O(SNR^). Reference ^24, considers the same 
capacity under the constraint that only the fourth and sixth order moments of the input are 
finite. Once again, the non-coherent capacity per degree of freedom is shown to be O(SNR^). 
Hence, fl^ |21] show that when there is a higher order (fourth and above) constraint on 
the input, capacity scales inversely with bandwidth. Thus, the non-coherent capacity does 
not approach the wideband limit and diverges from the coherent capacity as bandwidth 
increases. These results are akin to the single antenna channel results [HI El El Ej • Has- 
sibi and Hochwald pj^ propose a training scheme that is near-optimal in the high SNR 
regime. However, at low SNR, their scheme results in the rate per degree of freedom to go as 
O(SNR^). Since the overall rate decays to inversely with bandwidth, their training scheme 
is not desirable at low SNR. 

In this paper, we consider multiple antenna communication over a wideband, non-coherent 
Rayleigh block fading channel. We compute the capacity with only an average power con- 
straint, and consider it's interaction with the coherence length of the channel, number of 
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transmit and receive antennas and SNR. We establish how large the coherence length has to 
be in order for a non-coherent channel to have a "near coherent" performance at low SNR. 
More specifically, we show that if the channel coherence length is above a certain antenna 
and SNR dependent threshold, the non-coherent and coherent capacities are the same in the 
low SNR regime. We show that the transmit antennas effect the sublinear capacity term and 
hence, the approach of capacity to the wideband limit, with increasing bandwidth. More- 
over, we propose a signaling scheme that is near-optimal in the wideband regime. 
The capacity problem that we consider in this paper has been considered for single antenna 
channels by Zheng, Tse and Medard [2E]- They consider the interaction between coherence 
length and capacity at low SNR and compute the order of the sublinear capacity term. The 
work in this paper builds on their work, where, we analyze the more general MIMO channel 
and exactly compute the sublinear capacity term. We use a finer scale of analysis than j26.j . 
which allows us to understand how the transmit and receive antennas effect the sublinear 
capacity term and hence, the approach of the non-coherent capacity to the wideband capac- 
ity limit. 

We also analyze the error probability for the non-coherent low SNR MIMO channel. The 
behavior of error probability for the coherent [71120] as well as non-coherent [10 , 1^ MIMO 
channels has been well studied in the high SNR regime. For the coherent MIMO channel 
with coherence length 1, the error exponent is computed by Telatar p] for any SNR. The 
behavior of the error exponent for the non-coherent MIMO channel in the low SNR regime 
has recently been considered by Wu and Srikant in j2SI- Their analysis considers the linear 
capacity term, rSNR, and the error exponent is computed by fixing the coherence length and 
letting SNR tend to 0. 

Our consideration of the effect of the interaction among SNR, number of transmit and receive 
antennas and coherence length, on the error probability, yields a more detailed characteri- 
zation of the error probability behavior than described in j^H]- Our analysis shows that in 
the low SNR regime, the critical rate as well as the cut-off rate are much smaller than the 
channel capacity. Moreover, the error probability decays inversely with coherence length. 
We introduce the notion of "diversity" in the low SNR regime and use it to show that error 
probability decays exponentially with the product of the number of transmit and receive 
antennas. Hence, in terms of reliability in the wideband regime, transmit antennas have the 
same importance as receive antennas. In the high SNR regime, it is well known that outage 
dominates the error probability. Our analysis shows that this is true even at low SNR, i.e., 
channel outage dominates the error probability at low SNR. 

Let us establish notation that will be used in the rest of the paper. The bold type will be 
used to denote random quantities whereas normal type will be used to denote deterministic 
ones. Matrices will be denoted by capital letters and the scalar or vector components of 
matrices will be denoted using appropriate subscripts. Vectors will be represented by small 
letters with an arrow over them. All vectors are column vectors unless they have a super- 
script. Scalars will be represented by small letters only. The superscript ^ will be used to 
denote the complex conjugate transpose. 

The rest of the paper is organized as follows: Section 2 describes the channel model. The 
capacity and error probability results are in sections 3 and 4, respectively. We conclude in 
section 5. 
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2 Model 



We model the wideband channel as a set of N parallel narrowband channels. In general, the 
narrowband channels will be correlated. We restrict our analysis in this paper to channels 
having independent and identical statistics. We also assume that the coherence bandwidth 
is much larger than the bandwidth of the narrowband channel. Hence, each narrowband 
channel is modeled as being flat faded. From [2^1; we see that the behavior of channels with 
low SNR per degree of freedom is robust to reasonable modeling assumptions and necessary 
simplifications. Hence, the results for a more precise MIMO channel model will not differ 
significantly from that of the simple model we consider in this paper. 

Using the sampling theorem, the m^^ narrowband channel at symbol time k can be repre- 
sented as: 



where H[A;,m], x[/c, m], w[k,m\ and y[k,m\ are the channel matrix, input vector, noise 
vector and output vector, respectively, for the m*'^ narrowband channel at symbol time k. 
The pair {k,m) may be considered as an index for the time-frequency slot, or degree of 
freedom, to communicate. We denote the number of transmit and receive antennas by t 
and r, respectively. Hence, x[fc, m] G C* and y[k,m],w[k,m] G C^. The channel matrix 
H[fc,m] is a r X t complex matrix. The entries of the channel matrix are i.i.d zero-mean 
complex Gaussian, with independent real and imaginary components. Equivalently, each 
entry of H[fc,m] has uniformly distributed phase and Rayleigh distributed magnitude. We 
thus model a Rayleigh fading channel with enough separation within the transmitting and 
receiving antennas to achieve independence in the entries of H[A;,m]. The channel matrix is 
unknown at the transmitter and the receiver. However, its statistics are known to both. The 
noise vector w[A;, m] is a zero- mean Gaussian vector with the identity as its covariance matrix. 
Thus, w[k, m] ^ CJ\f(0, Ir). Since the narrowband channels are assumed to be independent, 
we will omit the narrowband channel index, m, for simplifying notation. The capacity of the 
wideband channel with power constraint P is thus times the capacity of each narrowband 
channel with power constraint P/N. Hence, we can focus on the narrowband channel alone. 
We further assume a block fading channel model, i.e., the channel matrix is random but fixed 
for the duration of the coherence time of the channel, and is i.i.d across blocks. Hence, we 
may omit the time index, k, and express the narrowband channel within a coherence block 
of length / symbols as: 



where, X G C*^' has entries Xjj,i = l,...,t,j = being the signals transmitted from 

the transmit antenna i at time j; Y G C^' has entries yij,i = 1, ■■■,r,j = 1, ...,/, being the 
signals received at the receive antenna i at time j; the additive noise W has i.i.d. entries 
Wij, which are distributed as CAf{0, 1). The input X satisfies the average power constraint 



where, SNR is the average signal to noise ratio at each receive antenna per narrowband 
channel. As tends to oo, SNR tends to 0, and the narrowband channel is in the low SNR 
regime. 



y[k, m] = H[fc, m]x[fc, m] + w[k, m], 



Y 



HX + W 
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3 Capacity of the Non-coherent MIMO channel 



In this section, we compute the capacity of the non-coherent MIMO channel at low SNR. 
The analysis shows the interaction between the number of receive and transmit antennas, 
coherence length of the channel and, SNR in the wideband regime. We also propose a 
signaling scheme that achieves capacity. 



3.1 Dependence of capacity on coherence length 



We first analyze the dependence of the non-coherent capacity on the coherence length of the 
channel. In p|, the structure of the capacity achieving input matrix for our non-coherent 
MIMO channel model is described as 



X = A$, 



where 



A 



I -'Tl 



is a t X / random matrix that is diagonal, real and nonnegative with identically (though may 
not be independent) distributed entries and ||xf || is the norm of the signal vector transmitted 
by the i^^ antenna. Since these entries are identically distributed, we have G {1, . . . ,t} 



E\ 



;T||2i 



-SNR. 

t 



$ is a / X / isotropically distributed unitary matrix. The row vectors of $ are isotropic 
random vectors which represent the direction of the signal transmitted from the antennas. 
A and $ are statistically independent matrices. Since this structure of the input matrix is 
optimal, we will restrict our attention to inputs having such structure. 
We first prove Lemma ^ which establishes two necessary conditions the input distribution 
must satisfy for the mutual information of the channel to be above a certain value. This 
lemma will be used in Theorem ^ to establish the dependence of the non-coherent capacity 
on the channel coherence length. 

Lemma 1 For any a G (0, 1] and 7 G (0, a), if there exists an input distribution on X such 
that 

y/(X; Y) > rSNR - ll!-t^SNR^+" + 0(SNRi+°+^), 

then the following two conditions are satisfied by this distribution: 
t 



:E 



V 
tE 



for all i & {1, ,t}. 



logfl + llx,^ 



T'||2\ 



log 1 



- ■^^-^SNR^+" + 0(SNR^+"+^), 
> SNR- ^^-^SNR^+" + 0(SNR^+"+^) 



(2) 
(3) 
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Proof: See Appendix 1. □ 

We are now ready to prove a theorem that describes the dependence of the non-coherent 
capacity on the coherence length: 

Theorem 1 Consider a non-coherent Rayleigh block fading MIMO channel with average 
signal to noise ratio SNR. Let the block length be I and the capacity, C(SNR). For any 
a G (0, 1] and 7 G (0, a), if 

C(SNR) > C*(SNR) ^ rSNR - lll-t^SNR^+" + 0(SNR^+"+^), 



then 



I > Lin = -^^SNR-2". 



(r + ty 

Proof: See Appendix 2. □ 

This theorem states that the coherence length must be strictly larger that Zmin, for the channel 
capacity to be above C*(SNR). Since the inequality for the coherence length is strict, this 
implies that a channel with capacity C*(SNR) will have its coherence length, I*, strictly 
greater than /min, i-e., 

•'mm ^ ■ 

3.2 Communicating using Gaussian- like signals 

In this subsection, we propose a signalling scheme using which a rate of C*(SNR) is achievable 
if the coherence length is greater than or equal to a threshold, which we denote as l^. 
We first prove a lemma that shows that using a Gaussian input distribution, we can achieve 
"near coherent" performance if the coherence length of the channel is large enough. 

Lemma 2 Consider a non-coherent Rayleigh block fading MIMO channel with average signal- 
to-noise ratio SNR. Let the block length be I and the capacity, C(SNR). If we use Gaussian 
signals over this channel, then for any e E (0,1), if 

I > -^^SNR-2(i+^), 

then 

C(SNR) > rSNR - ^^^^^^ SNR^ + 0(SNR2+^). 

Proof: We first lower bound the mutual information of the non-coherent channel MIMO 
channel as 

/(X;Y) 

= J(X; Y|H) + J(H; Y) - J(H; Y|X) 

> /(X; Y|H) - /(H, Y|X). (4) 



10 



CA/'(0, ^7^). Note that it is exactly this distribution that achieves capacity for the coherent 



Let us choose the distribution of X to be one where all the entries of X are i.i.d. and 
m 

t 

MIMO channel. Therefore 
1 



/(X; Y|H) = rSNR - ^^^^^^ NR^ + 0(SNR=^) 



(5) 



/(H; Y|X) is the information that can be obtained about H from observing Y, conditioned 
on X being known. Therefore 

/(H;Y|X) 

= h(Y\X) - /i(Y|X,H) 

= rtElog(l + ||xf f) 
/ 

_( 

t 



< rtlog {\ + ^SNR 



where we have used Jensen's inequahty to get the upper bound in 
and (0) and noting that 

1 



(6) 

Combining (jH), ® 



we obtain: 



C(SNR) > y^(X;Y) 



C(SNR) > rSNR - ^^^^^^ SNR^ - ry log (l + -SNr) + O(SNR^). 



For any e G (0, 1], let us choose 



(7) 



(r + 1) 



:SNR-2(i+^). 



Therefore, 



t / I 
rylog [l + -SNR 



t 



SNR2(^+^) log (l + 



t 



^-t^SNR2(i+^) log 



t 



r^^-^SNR2(i+^)log 
t 

<^(!L+^SNR2(l+e)log 



;r + t)2SNR^+'^ 

t 



t \ (r + ty 



t 



t 

(r + t)2 



+ r- 



t 



+ o(SNR2(i+^)) 
1 + 2e)SNR2+^ 



SNR^ log 



SNR 



+ r^^-t^(l + 2e)SNR2+^ + o(SNR2(i+^)^ 



0(SNR 



2+e\ 



+ o(SNR2(^+^)^ 
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In (jHl), we use that, since e > and SNR 0, SNR'log(5^) < 1. Since rf log (l + fSNR^ 
decreases monotonically with /, we have that 



- (r + t)2 
rj log (l + ^SNr) < 0(SNR2+^). 

Combining this with ((Tj) completes the proof. □ 

We now introduce an input distribution that has a flashy as well as a continuous nature. A 
similar input distribution was first introduced in for achieving the order of the sublinear 
capacity term for a single- input, single-output, non-coherent Rayleigh block fading channel. 
For a given a G (0, 1], let us transmit in only (5(SNR) = SNR^"° fraction of the blocks. As 
we are in the low signal to noise ratio regime, (5(SNR) G (SNR, 1]. Since we concentrate the 
power only over a fraction of the blocks, the signal to noise ratio for the blocks in which we 
transmit increases to SNR where 

In the blocks that we choose to transmit, let the entries of the input matrix X be i.i.d. 

Note that as we increase a from to 1, the fraction of blocks that we transmit increases from 
SNR to 1. Therefore, as a increases, the distribution changes from a peaky to a continuous 
one. We will call this type of signalling as Peaky Gaussian. We prove the following theorem: 



Theorem 2 Consider a non-coherent Rayleigh block fading MIMO channel with average 
signal to noise ratio SNR. Let the block length be I and the capacity, C(SNR). // we use 
Peaky Gaussian signals over this channel, then for any a G (0, 1] and e G (0, a), if 

l>lGA ^l^sNR-2("+^), 
(r + t)2 

then 

C(SNR) > C*(SNR) = rSNR - !l!-t^SNR^+" + 0(SNRi+°+^). 

Proof: Let us use the Peaky Gaussian like distribution for communicating over the non- 
coherent MIMO channel. We can now apply Lemma [21 to the blocks that we choose to 
transmit. Note that these blocks have a signal to noise ratio of SNR . Thus, for any e G (0, 1], 
if 

/ > ^SNR'-^^^^^'^ 

= -il— SNR-2("+"^'). 
(r + 1)2 
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then 

C(SNR') > rSm' - ^ll^SNR'' + 0(SNR''^''). 

Since we are transmitting in 5(SNR) fraction of the blocks, 

C(SNR) 

= 5(SNR)C(SNR') 

> rSNR - !l!-t^SNR^+" + 0(SNR^+"+"^'). 

Note that for e G (0, 1], ae = e E (0, a]. This completes the proof. □ 

Thus, we see that using Peaky Gaussian signals, a rate of C*(SNR) is achievable if the 
coherence length is greater than or equal to l'^. 

To reliably achieve any rate, the required coherence length using Peaky Gaussian signaling 
is strictly greater than the required length (Theorem^ using the optimal input distribution. 
Thus, if I* is the coherence length needed to have a capacity of C*(SNR), 

/ ■ < r < i'^ 

However for a G (0, 1], as e ^ 0, Z*^ ^ /min- Hence, the Peaky Gaussian input distribution is 
near-optimal for the non-coherent MIMO channel. 

Thus, from Theorems ^ and |21 we see that for any a G (0, 1] and e G (0, a), if 

SNR-2" < / < ^l^SNR-2(°+^), (9) 



(r + t)2 - (r + t) 

the sublinear capacity term is: 

A(*''-)(SNR) = !^-i^SNR^+" + 0(SNRi+"+^). 
We summarize this result in the following theorem: 

Theorem 3 Consider a non- coherent Rayleigh block fading MIMO channel with average 
signal to noise ratio SNR. For any a G (0, 1] and e G (0, a), the capacity of the channel is 

C(SNR) = rSNR - !l!-i^SNRi+" + 0(SNRi+"+^) 
if and only if there exists a a G (0, e) such that 

/ = -^^SNR-2(-+-). 

(r + 1)2 

This theorem tells us the capacity of a non-coherent MIMO channel in the low SNR regime 
and shows its dependence on the coherence length of the channel, number of receive and 
transmit antennas and SNR. Note that the transmit antennas effect the sublinear capacity 
term. Peaky Gaussian signals are near-optimal when communicating over this channel. 
Note that cr is used in the theorem to parameterize 0. The theorem leads to the following 
corollary: 
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Corollary 1 Consider a non-coherent Rayleigh block fading MIMO channel with average 
signal to noise ratio SNR. For any a G (0, 1] and e G (0,a), the sublinear capacity term 

A(*''-)(SNR) = !^^t^SNR^+" + 0(SNRi+"+^) 
if and only if there exists a a & {0, e) such that 

/ = -^^SNR-2(°+'^). 
(r + 1)2 

In Theorem |H1 and Corollary d a is used to indicate how close the channel capacity is to 
the coherent and non-coherent extremes. The coherent channel corresponds to the case 
when a = 1 and the i.i.d non-coherent channel corresponds to the case when a — 0. We 
have also seen that Peaky Gaussian signals are optimal for the non-coherent MIMO channel. 
Thus, with a channel coherence length of / ~ -^^^^^SNR"^", one should transmit Gaussian 

signals in 6 = SNR^~° fraction of the blocks. At the coherent extreme, 5 = SNR*^ and one 
should transmit in all the blocks in order to achieve capacity. On the other hand, for the 
i.i.d Rayleigh fading channel (non-coherent extreme), one should only transmit in 5 = SNR^ 
fraction of the blocks. We shall study the non-coherent extreme with a finer scaling later on 
in the paper. 

Let us eliminate the parameter a from Corollary ^ Hence, the sublinear capacity term 
becomes 

A(t,0(SNR) = ^SNR + o(^). 



From (^, we have 



Hence, the minimum energy required to transmit an information bit decreases inversely with 
the square root of the coherence length of the channel. Thus, energy efficiency improves as 
the coherence length increases. These results apply only for a G (0, 1]. For channels whose 
coherence time is larger that -^^^SNR"^, the sublinear capacity term remains O(SNR^). 
We now focus on the coherent and non-coherent extremes. 

3.3 Coherent Extreme 

In this case, a = 1 and from Theorem |2l we know that for e G (0, 1) 

C(SNR) = rSNR - ^^^^-^SNR2 + 0(SNR2+^) 

2/- 

iff there exists a cr G (0, e) such that 

/ = -^^SNR-2(i+-). (10) 
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We see that provided the coherence length is large enough, the non-coherent capacity is the 
same as the coherent capacity in the low SNR regime. Moreover, the Peaky Gaussian signal 
is now completely continuous. Hence, when / > -^^q^^SNR" , the coherent and non-coherent 
capacities are the same in the low SNR regime and, continuous Gaussian signals are optimal 
for both. 



3.4 Non-coherent Extreme 

From TheoremOl we see that as a — > 0, / — 1 and we have an i.i.d. Rayleigh fading channel. 
In order to get the exact value of the sublinear capacity term for this channel, we need to 
know the precise value of a, which is not possible by this asymptotic analysis. We do the 
precise analysis in Appendix 3 and show that the capacity is^ 

C(SNR) = rSNR - A.;i;J(SNR). 

where. 



A[-::^(SNR) 



The capacity is achieved using an on-off input distribution that becomes increasingly "flashy" 
at low SNR. This is consistent with our asymptotic analysis which showed that only 5 = SNR^ 
fraction of the blocks should be used for transmission in the non-coherent extreme. Hence, 
the result shows that besides on-off signaling being optimal for the single- input, single-output 
i.i.d Rayleigh fading channel ^3], it is also capacity achieving when multiple antennas are 
used. 



4 Error probability for the non-coherent MIMO chan- 
nel 

In this section, we study the block error probability for the non-coherent MIMO channel, 
Pe!°or^, when maximum-likelihood decoding is used at the receiver. This error probability is 
the average over the ensemble of codes when Peaky Gaussian signaling is used and can be 
expressed as: 



pblock 
error 



Pr (error I Block used for transmission) ■ Pr(Block used for transmission) 

+ Pr (error I Block not used for transmission) • Pr(Block not used for transmission). 



^Definition of (=): Let /(SNR) and ^(SNR) be functions of SNR. We denote /(SNR) = g(SNR) if 

SNR^o logg(SNR) 
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Since we use Peaky Gaussian signaling and the receiver is assumed to have perfect knowledge 
of the blocks that are being used for transmission, we have 

Pr(Block used for transmission) = (5(SNR), 
Pr(error|Block not used for transmission) = 0. 

Hence, 

-^error'^ = 5(SNR) ■ Pr (crror | Block used for transmission). 

If we consider the input matrix transmitted in a block, X, as a super symbol of dimension 
txl, the channel is memoryless, since, for each use of the channel an independent realization 
of H is drawn. Hence, using the results in P , the error probability can be bounded as 

Pr(error|Block used for transmission) < ex]i[—Er{R)], 

where, Er{R) is the random coding error exponent for the super symbol channel: 

Er{R) = max |^o(p) -pR], 
pe[o,i] I J 

where, 



q{X) is the distribution of X, R is the transmission rate in nats per block used for transmis- 
sion and Y is the channel's output matrix. 

Since the signaling is Gaussian in the block used for transmission, 

q{X) = ^ exp [ - trace{X^X) 

The range of R for which Er{R) is positive is: 

/ 



0<R< 



C(SNR) = C'^'°"HSNR), 



6{Sm) 

where, C'''°'^'^(SNR) is the non-coherent capacity per block. 
If we express / as 



I 



SNR 



-2u 



(r + 1)2 

then, from the results in the capacity section 

5(SNR) = SNR^-"^'"^^''^^, 

r(r -|- t) 



C(SNR) 
C"''°"^(SNR) 



rSNR- 

(r + t)' 



2t 



+min{l,i/} 



^C|\|pmm{l,y} _ ^(^ + ^) c|\jp2miii{l,i^} 
2t 



+ o(SNR-2["'^^^''^^l). 



The signal-to-noise ratio in the block used for transmission SNR^, is: 

SNR 



5|\|pmm{l,^}^ 



5(SNR) 

The main result is summarized in the following theorem, which is proved in subsection 14.21 
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Theorem 4 The block error probability for a non-coherent Rayleigh block fading MIMO 
channel, -Pg'r'ror'^j when maximum-likelihood decoding is used at the receiver can be upper 
bounded as: 



where, 



EriR) = rt log I 1 + ^7^— \-R- 0(1) R G [0, PcnticalJ 



rt log I 1 + 

0(1) 





2(t + r)2 

p*tSm-^^''-"'"'^^'^^^ 
(t + r)2(l +p*) 



P*i?-0(1) i?G [identical, C^',r(SNR)] 

[C^P(SNR),C^'°^^(SNR)] 
Re [C^^°'\SHR),oo), 



and 



\ 



,rt (t + rj^SNR'-""'''"' , 



^critical 

C^P(SNR) 



rt/2 + o(l), 
SNR 



(r + ty 



_r(r-M) 2 min{l,4 



2t 



5|\|R2min{l,!/} _^ o ^SN R™'"'^'""'" """2^'"^ .2 min{l,iy}} 



4.1 Discussion of Theorem m 

Theorem E] divides the range of rates for which Er{R) is positive into three regions - A, B 
and C, which is illustrated in Figure El Let us consider region A: R E [0, i?criticai]- Since, 
^critical = 0(1) and C^'°^^(SNR) = 0{SHR~^^"''^^'''^^), the critical rate is much smaller than 
the channel capacity: 

identical «C^'°^'(SNR). 

Region A is an 

Q^^^^l2u-min{i,u}]^ fractlou of the capacity and is very small in the wideband 
regime. The cut-off rate, -Rcut-off, is given by 

^?cut-off = Er{0) = rt ■ [2u - min{l, u}] ■ log . 

Since the cut-off rate is an O^SNR^^'^^™™^^'''^' ■ log(5i^)j fraction of the capacity, it is much 
smaller than the capacity in the wideband regime: 

Pcut-off < C'^'°^HSNR). 
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Figure 3: Random coding error exponent for the non-coherent MIMO channel at low SNR. 



Let us consider the third region over which Ej.{R) is positive, region C: i? G [Cjyi^'^(SNR), C'^'°'^'^(SNR)]. 
This interval is a [C"''°"^(SNR) - C^'yb^^(SNR)]/C"^i°^^(SNR) fraction of the capacity where, 



^biock(SNR) 



o(SNR) 



V < 



V > 



Hence, region C is also a very small fraction of the capacity in the wideband regime. There- 
fore, we can conclude that it is region B: R E [-Rcriticai, Cbiy^^k(SNR)], that dominates the range 
of rates in the wideband regime. 

From Theorem the error probability in Region B can be expressed as: 



pblock ^ C|\|Dl-min{l,!/} 



R 



rt 



J. 5^pmin{l,v} 

To observe this, let us consider the error exponent for 

R = l-rSm'', min{l,z/} < K < 2z/, 
This rate lies in Region B and the optimum p is 



Substituting in Theorem HI we observe 
Er{R) 



rtloK I 1 + ^ — ]-p*R- oil] 



rt log 



(t + r)2(l + p*) 

I ■ SNR'^''^^^''^^" 
R 



(p'=o(-fe)) 



(12) 



(13) 
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For 1/ < 1, SNR"''°^^'''> oc Hence, for a fixed rate i?, the error probability decays 

inversely with the coherence length in the following way: 



/ , rt-l 

1^ ' /<SNR-2 



pblock I 
error ^ » 



Let us now examine the effect of antennas on the error probability. To analyze this, we 
propose a definition of "diversity" in the low SNR / wideband regime. 
Let V and W be the total received power and system bandwidth, respectively. High SNR 
diversity, is commonly defined as: 

du{W) ^ - hm '"f7-^ 

V^oo log(P) 

This definition describes the asymptotic behavior of error probability with received power, 
for fixed bandwidth. 

In the low SNR/wideband regime, we define diversity, di^V), as: 



vy^oo log(W^) 

This definition describes the asymptotic behavior of error probability with bandwidth, for 
fixed received power. Since, SNR oc l/VT, an equivalent definition of low SNR diversity is ^: 

dL= lim ■ (14) 



From jmUni), we have 

dj. = r ■ t 



SNR^O log(SNR) 



K — min{l,z/} + 1 — min{l, z/}. 



Hence, we conclude that the decay in error probability is exponential with the product of the 
number of transmit and receive antennas, r ■ t. Similar to the high SNR regime, the product 
of the number of transmit and receive antennas comes about as a diversity factor in the low 
SNR regime. Hence, we conjecture that r ■ t is a diversity factor for a MIMO channel at any 
SNR. 

In the capacity section of this paper, we have seen that receive antennas have greater sig- 
nificance than transmit antennas since, the former effects the linear as well as the sublinear 
capacity term whereas, the latter effects only the sublinear term. However, since the error 
probability decays exponentially with r ■ t, the transmit antennas have the same importance 
as receive antennas in terms of reliability. This emphasizes the importance of multiple trans- 
mit antennas in the wideband regime. 

Let us now consider channel outage in the low SNR regime. For a block fading channel, 



^We omit the argument of for simplicity. 
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outage occurs in a coherence block when the channel matrix is so ill-conditioned that the 
block mutual information cannot support the target block data rate. We denote the outage 
probability as Poutage and present a heuristic computation to show that 

Thus, we see that in the low SNR/wideband regime, for rates away from capacity, the error 
probability is dominated by the outage probability. Hence, like at high SNR, channel outage 
is the major source for errors even at low SNR. 

Heuristic Proof: The outage probability can be upper bounded using a training based scheme 
(This scheme is described in detail in the proof of Theorem H}. We directly state the channel 
model when this scheme is used for data transmission (the first t symbols are used for 
training) : 

fi = H'xj + V-, i = t + 1, . . . ,1, 

where, H' has i.i.d CN(0, 1) entries and is perfectly known at the receiver (this is the MMSE 
channel estimate) , is a zero-mean noise vector having the covariance matrix 



r-i 



and, {xj} are i.i.d complex Gaussian vectors: 



where. 



Now, 



/*(SNR) = SNR"^'°^^''^> + ©(SNR^"'"^^^''^^). 



P 

outage 



= Pr (^/(xt+i, . . . ,x«;yt+i, . . . ,y/|H') < i?^ 

< Pr (^logdet (j, + /^^^^H'tH') < ^) (15) 

< Pr (log (l + ^^^^trace(H'tH')) < ^) (16) 

Equation ()15|) follows since the mutual information is minimized if {v^} are i.i.d complex 
Gaussian [TTl[Tn]. In (fTBj). we use the inequality: 

det (/, + /imH'tH') > 1 + Z:m..ace(H'.H'). 
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In ()17|). Xrt represents trace(H''''H') and is a chi-squared random variable with rt degrees of 
freedom. Hence, if we choose the rate in Region B as in ()13p. we have for low SNR, 



R 



//*(SNR) 



Hence, 



outage 



block 



< 1. 



R 



rt 



□ 



4.2 Proof of Theorem IH 

Upper Bound to Ej.{R): 

We first establish an upper bound to Er{R) by providing the receiver perfect knowledge of H. 
Let us denote the random coding error exponent for this coherent channel by [R). Since, 
the error probability for the coherent channel cannot be greater that the channel without 
knowledge of H, we have 

Er{R) < El!{R), (18) 

where, 

E,^(i?)= max {<(p)- pi?) , 
pe[o,i] I- J 

and 



The computation of Eq{p), when / = 1, is done in p. Here, we do the computation for 
arbitrary /. The following lemma specifies an upper bound to E^^p): 



Lemma 3 



E^{p)<rtlog{l + 



(t + r)2(l + p) 

Proof: Since H is independent of X, 

p{Y,H\X)=p{H)p{Y\X,H). 
Hence, E^{p) can be expressed as 



E^{p) = -log [En 



q{X)p{Y\X, B.)—pdX 



dV 
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The conditional probability p{Y\X, H) is given by 



p(y|X,/f)^(5^)"exp 



SNRf 



trace I [hX - ^ (^HX - ^) } 



Defining B as 



B 4 



SNRf, 



In the proof of this lemma, for any matrix M, we use to denote its pseudoinverse. 
Now, 



q{X)p{Y\X, H)^pdX 

trace{X^X) 



1 



1 /5NRbv+p 



/SNR, 



6\ i+P 



exp 



exp 



1 /SNRf 



exp 



V TTt 

- tracelx'' (It + B)X - 



SNRf, 



trace|(M-y)t(M-y)} 



t(l + p) 
SNR, 



trace|F^(7^ - (7^ + B-^)-^)Y 



{X^H^Y + Y^HX - Y^Y) | j 
SNRf, 



exp 



•trace|x^i7^(5-^ + QHX - X^H^Y - Y^HX + Y^Ir + B-^)-^Y^ 



dX 



SNRftMT^ 



Therefore, 



exp 



- ^|^trace{Ft(/, _ (/^ + B-Y')y}] det(/, + 5)"' 



P) 



q{X)p{Y\X, H)^pdX 

(^^y'det(/, + i3)-^(^+^) I exp 

det(Jt + det (^/^ - {Ir + 5"^)-^ 

det(/t + B)-P' 



SNR, 



det It + 



SNRf, 



-pi 



t{l + p) J 



Hence, 



-log -Eh 



det I 



SNRf, 



H^H 
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= - log Eh 
< -log£;H 
= - log 



exp ( —pi log det ( It 

^ trace(H"fH 



SNRb 



H+H 



exp — 



i(l + p) 



(20) 



(^^-1)! 
p/SNRfo 



exp 



1 + 



p/SNRfe 

t(l+p) 



X 1 (ix 



To obtain (pUj) . we use the following inequality: 



log det [lt + 



^HtH 



min{t,r) 



< 



i=l 



SNR, 



+ P) 



min(f,r) 



t{l+p) 

SNR 



^ traceCHtH) 



i(l+p) 

Aj(H^H) is the i^^ eigenvalue of the random matrix H'''H. Hence, Eq{p) can be upper 
bounded as: 



< rt log yl + 
= rt log 1 + 



p/SNRfc 

t{l + p), 
ptSNR'^^""^^'''^^ 
(t + r)2(l +p) 



This completes the proof of the lemma. 

Combining ()18|) with Lemma El we obtain an upper bound for Er{R): 



□ 



Er(R) < max < rtlog 1 

P6[0,l] ^ ' 



ptSNR-^^"''^^^''^^' 
(t + r)2(l + p) 



pR 



(21) 



Since, Er{R) is positive over the rate range (fTT|) . any upper bound to it will also be positive 
over (fTT|) . In fact, since perfect knowledge of H at the receiver increases capacity, the upper 
bound is positive over a rate range larger than ()11|). 

Lower Bound to Er{R): 

We now use a training based scheme to obtain a lower bound on Er{R). Since this is one 
of the possible schemes that can be used for the non-coherent channel, the random coding 
error exponent for this scheme, E^{R), can be upper bounded as 



E^:{R) < Er{R). 



(22) 
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We rewrite the channel model within one coherence block as 



yi = 



Hxj + Wi, i = 1,. . .,1. 



(23) 



The channel matrix, H, is constant within the block. The total energy available in the block 
is: 



We use the first t symbols of the block for training^ using 7 e (0, 1) fraction of the total 
energy. The remaining fraction is used for communicating data. Hence, the energy used for 
training is: 

Etraining = TEtotal = T^SNR;,. 

The following training sequence is used: 



This training scheme makes jij a sufficient statistic for estimating hjj-. The receiver com- 
putes the minimum mean-squares error (MMSE) estimate of H from [yi...yt]. Using 
hjj and h^ j to denote the estimate and estimation error of hj ., , respectively, we have for 



where, H and H are independent matrices, each with i.i.d Gaussian entries. 
For the remaining I — t symbols within the same block, Etotai — Etraining = (1 — 7)/SNRft 
energy is used to send data using an i.i.d Gaussian code. The channel in this phase can be 
represented as 



Etotai — i ■ SNRft. 




i e {l,...,r},j e {!,..., t}: 




and, hi J, hjj are independent due to the estimation being MMSE. Moreover, the sets {hjj} 
and {hj j} have independent elements. Thus, representing the estimate and estimation error 
of the channel matrix as H and H, respectively, we have 



H = H + H, 




(24) 



{xj} are i.i.d complex Gaussian vectors: 




■^We assume I > t. 
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Hxj is the noise due to the estimation error from the training phase coupled with the input 
signaL Combining the additive white noise with the noise due to estimation error, we have 

Vi = Hxj + Wj. 

Note that Vj is uncorrelated, but not independent of Hxj. Its covariance matrix is 



l-7)/SNRf, t 



t{l-t) 'i- 
t(l -7)/SNR6 



raining 



t 



1 



+ 1 



Ir. 



(/-t)(t + 7/SNRfe 
The channel in (j24p can be normalized to: 

yi = H'xj + v-, 2 = t + l,...,/, 



(25) 



where, H' has i.i.d CA^(0, 1) entries and is perfectly known at the receiver (this is the MMSE 
estimate) , is a zero-mean noise vector having the covariance matrix 

and, {xj} are i.i.d complex Gaussian vectors: 



where 



/(7,SNR) 



7fSNRb _ (l-7)fSNRi, 
t+7/SNRb ' (l~t) 
t(l-7);SNR6 , -I 



{l-t){t+^lSNRt) 



Now, 



/(7,SNR) 

= ism 



7(1-7)/ 



> ZSNR^ ■ 



t(l-7)/SNRfe+ (/-t)(t + 7/SNRfe 

2 7(1-7) 



> 



ISNRl 



-flSNRb + t{l + SNRb) 

7(1-7) 



t(l + SNRb) 1 + 7 



iSNRfc 



(26) 



t(l+SNR6) 



Define 



/*(SNR) = max /(7,SNR). 

76(0,1) 
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Using (pUj) . we get a lower bound to /*(SNR): 
/*(SNR) 

7(1-7) 



> 



isml 



max 



t(l + SNRfe) 7e(o,i) 1 + 



7 



iSNRb 



i(l+SNR6) 



SNR, 



iSNRft 



t(l+SNR6) 



t(l+SNR6) 



■ n 1 (r -\- t) , miiifl,!/} 

5|\|pmm{l,i.} _ 2 ^ ^ ^ SNR'^+^^ 



+ SNR'' 



/2ij(SNR). 



Hence, 



■ n 1 (t + r I I min{l,i^} / , min{l,i^} 

r (SNR) > /2b(SNR) = SNR'"'°^i'"^ - 2^-^SNR'^+^^ + o(SNR'^+^^ 
Note that 

/2b(SNR) = SNR"''"^^'"^ + o(SNR"^'°^^''^^). 
The random coding error exponent for this scheme is 

E^{R)= max {Eo^(p)- pi?}, 

pe[o,i] 



(27) 



(28) 



where, 



Since the training and data communication phases use independent input signals, H' is 
independent of X. Thus 

piY,H'\X)=p{H')piY\X,H'), 

and 



The following lemma specifies a lower bound to E^^p): 
Lemma 4 



Eo-(p)>rtlog(l + ^ . )-o(l 



(t + r)2(l + p) 

Proof: References ITTI] show that capacity of the channel in ()25j) is minimized if {vj} are 
i.i.d Gaussian: 

Vi ~ C7V(0, Ir) i = t + l,...,l. 
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We conjecture that this noise distribution also minimizes error exponent. With this assump- 
tion, the error exponent for this channel with i.i.d Gaussian noise is similar to that of the 
coherent channel {Eq{p) in |T9|) . with SNRb replaced by /(7, SNR) and / replaced by I — t). 
Hence, we obtain the following lower bound: 



where. 



Equation 



det \^It + 
det ( It + 



> max < — log -Ett' 



log 
log 



> - log Eh' 



>-logEH' 
= - log 



t(l+p) 



-p{l-t) 



max^g(o,i)/(7,SNR) ^,^^,^ *^ 
t{l + p) 



-p{i-t)' 



,^/M(SNR),,3ce(H'tH" 



det It + 



ta + p) 



°° x*"* ^ exp(— x) 



-log 
-log a 



X 



(rt- 1)! 

rt-l 

exp 



t{l + p) 



-p{i-t) 



-p{i-t) 



dx 



{rt - 1)! 



-x-p{l-t)\og (l + ^ 



/2b(SNR) 

P) 



C 



exp 



-x-p{l-t)\og + ^ 



(rt-l)! 

holds due the following inequality: 



/£b(SNR) 

p) 



X 



dx. 



det(/. + ^ 



/£b(SNR). 



HtH 



t{l + p) 



1=1 



i(l+P) 



> 1 + 



/£b(SNR) 



min(t,r) 



.l + %(5^.,3ce(H'.H'). 

Ai(H''''H') is the i^^ eigenvalue of the random matrix H'"!"!!'. 

We now compute an upper bound to C . Splitting the range of integration, we have 

C = Ci + C2, 



(29) 



(30) 



(31) 



(32) 
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where, 



C2 



(^^-1)! 

rt-1 



exp 



tii + p) ' 



X 



ht(i+p) 

Ci can be upper bounded as: 



exp 



-X - p{l - t) log (l + 



< 



< 



< 



2i(l+p) 3,ri-l 





2t(l+p) 



(rt-1)! 

exp 



exp 



exp 



-X 



p(/-t)/2^(SNR)x / /£b(SNR) 
t(l + p) V 2tfl + /)) 



2t(l + p) 



X 



dx 



_ p(/-t)/2^(5NR)(l-/£^(SNR)) ^- 



i(l + p) 



00 2;''*"-'- 



(rt-1)! 



_^ _ p(/-t)/£^(SNR)(l-/£^(SNR)) ^- 



t(l + p) 



dx 



^ p(/-t)/2^(SNR)(l-/£^(SNR)) 



+ 
1 + 

1 + 
1 + 
1 + 



i(l + P) 

ptSNR-P^-'^'"^^''^^] 
(t + r)2(l + p) 

ptSNR-[2'^~"^'°^^''^^] 
(t + r)2(l + p) 



i(l + p) 

/ - t) fsNR'"'"^^-'^^ + o(SNR'"'"^^i''^^)) f 1 - SNR"^'°^i'^> + o(SNR"^'°^^''^^) 



(1 _ il±ll!sNR2-)(i + o(i))(i _ SNR"^'"^^''^^ + o(SNR"''"^i''^>)) 



(1 + 0(1) 



-rt 



ptSNR 



(t + r)2(l + p) 
C2 can be upper bounded as 

< 



-rt 



1+0(1) 



X 



rt-1 



< 



2t(i+p) (rt-1)! 

00 rf,rt-l 

exp 



exp 



Combining 
C 



- p(/ - t) log (^l + 2/£^(SNR) 

x-p(/-t)log fl + 2/£^(SNR) 
-p(/-t) log(^l + 2/£^(SNR) 
we get the upper bound for C as: 



dx 



dx 



(34) 



(35) 



< 



1 + 



ptSNR 



-[2i/-min{l,z/}] 



n ~rt 



(t + r)2(l + p) 



1 + 0(1) 



+ exp 



p(/-t)log l + 2/2^(SNR) 
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1 + 



(t + r)2(l + p) 



-rt 



l + o(l) 1+ l + o(l) 



(t + r)2(l + p) 



• exp 



p[ -rSNR-'" - t] log(l + 2(SNR"^''^^i''^> + ofSNR^^'^^i''^^))) 



1 + 



(t + r)2(l + p) 



-rt 



-rt 



l + o(l) 1+ l + o(l) 



1 + 0(1) 



ofll 



(t + r)2(l +p) 

From (jnOlinni), we get a lower bound to -Eq (p) as: 

ptSNR-t^'^-"""^^'''^] 
(t + r)2(l + p) 



(36) 



This completes the proof of the lemma. 

Combining ()221 128^ with Lemma 01 we obtain a lower bound for E^{R): 



EAR) > max < rtlog 1 

pe[o,i] 



(t + r)2(l + p) 



pR\-o{l] 



(37) 
□ 

(38) 



Since the training based scheme has a lower capacity than the non-coherent capacity [TH ll9j. 
the range of rates for which the error exponent for the training based scheme is positive, is 
reduced from (jllll . We compute a lower bound to the capacity for this scheme. 
Letting {v^} be i.i.d white Gaussian vectors, i.e., ~ CA/'(0, Ij) in (j2SI), we can lower bound 
the capacity per block used for transmission, Cj°'^'^(SNR), as 



Cr (SNR) 

> (I — t). max I logdet 

7e{o,i; 



It + 



/(7,SNR) 
t 



H'tH' 



= (/-t). logdet 
= (/-t). logdet 
> (/ -t). logdet 



^ max^g(o,i)/(7,SNR) ^,^^; 
t 

■ /;b(snr)^ 



>(/-t). 

> 



^^/II(SNR) 



(t + r)' 



SNR 



-2u 



■ n 1 r(r -\- t) , min{l,i^} 

^5^pmm{l,.} _ 2lll^SNR'^+^^ 



.r(r + t)5^p,2min{lM+o/^SNR" 

2t V 



^|^^rnmiLfi,2min{l,!/}} 



= C^i^'^(SNR). 

(39) 
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Hence, the lower bound to Er{R) in ()38|) is positive in the range 

0<R< (SNR). 

Combining the upper and lower bounds: 
Combining ()21|38|) . we have 

(t + r)2(l + p) 



Er(R) = max < rtlog 1 + 



pe[o,i 



pR\-o{l), 



< i? < C-^'yb'''(SNR). 



Let 



p* = arg max < rt log 1 + 



pe[o,i] 



(t + r)2(l + p) 



pi? . 



We compute p* as: 



where, -Rcriticai, the critical rate, is 

identical =rt/2 + o(l). 

Substituting p* in (jlOJ, we have for < i? < -Rcriticai) 



< R < -Rcriticai 

identical <i?<C^','^^'(SNR) 



Er{R) = rtlog I 1 + 
and, for /^critical <R< ^^'-^(SNR) 

Er{R) 

( 



2(t + r) 



-R-ofll 



rt log 



1 + 



1 + 4 



R 



(t + r)2 



2" 



1 + 4 — - ^ ■ - 1 



R 



(40) 



(41) 



0(1). 

(42) 



For R G (^C^i°^"'^(SNR),(:7bi°^k(s|\|R)j^ ^j^^ ^o^^i bound dSHl) to Er{R), is 0. However, the 

upper bound (PT|) is o(l) in this range. Hence, we can say that 

Er[R) = o(l) for (SNR) <R< C^'°"^(SNR). (43) 

Equations ()41II43|) together characterize the random coding error exponent for the non-coherent 
channel. This completes the proof of Theorem ^ 
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5 Conclusions 



In this paper, we have computed the capacity and error probabihty for the non-coherent 
wideband MIMO channel. The effect on capacity and rehabihty of coherence length and 
number of transmit and receive antennas have been examined. The analysis has shown that 
though the number of transmit antennas does not effect the linear capacity term, it does 
effect the sublinear capacity term, i.e., the approach of capacity to the wideband limit with 
increasing bandwidth. We have also established conditions on the channel coherence length 
and number of antennas, for the non-coherent capacity to be the same as the coherent 
capacity in the wideband regime. The error probability is shown to decay inversely with 
coherence length and exponentially with product of the number of transmit and receive 
antennas. This highlights the importance of multiple transmit antennas, besides multiple 
receive antennas, in the low SNR regime. An interesting observation has been that outage 
probability dominates the error probability even at low SNR. 
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Appendix 1 

Proof of Lemma [TJ 

Proof of d^.- For any a G (0, 1] and 7 G (0,a), let there exist an input distribution on X 
that satisfies the following: 

y/(X; Y) > rSNR - !l!-t^SNR^+" + 0(SNRi+°+^). (44) 

Let be a matrix, with i.i.d complex Gaussian entries, that satisfies the same power 
constraint as the received matrix, Y, i.e. £'[trace(Y'^Y'^"'')] = i?[trace(YY"'")]. This makes 
h(Y) < h(Y^) and the entries of Y^ i.i.d CA/'(0, 1-hSNR). Moreover, conditioned on X, the 
row vectors of Y are i.i.d CJ\f{0, XX^^ + J/). We can thus upper bound the mutual information 

as 

J(X;Y) 

= h{Y) - h{Y\X) 
< /i(Y^) - /i(Y|X) 

= rnog(l + SNR) - rtE[log{l + \\^Jf)] 

<r/SNR-rtE[log(l + ||xff)] (45) 

Combining (j44j) and (j45j) and noting that the norms of the input vectors ||xf || are identically 
distributed, we see that if the input distribution satisfies (j44j) . then it necessarily satisfies 
the first condition Q. 

Proof of Observing the structure of the optimal input |HI for the non-coherent MIMO 
channel, we can upper bound the mutual information as 

/(X;Y) < /(A;Y|$) + /($;Y|A), (46) 

where, /(A; Y|$) is the information conveyed by the norm of the transmitted signal vectors 
given that the receiver has side information about their directions, and /(4>; Y| A) is the in- 
formation conveyed by the direction of these vectors when the receiver has side information 
about their norm. We establish upper bounds on these two terms. 
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Upper bound for J(A; Y|$): 

When the receiver has side information about it can filter out noise orthogonal to the 
subspace spanned by the row vectors of $ to obtain an equivalent channel 

Y$t 

= HA + W', 

where W' has the same distribution as W and there is no loss in information since is 
a sufficient statistic for estimating X from Y. Therefore 

J(A; Y|$) 

= /(A; Y$+|$) 

= /(A;HA + W'l*) 

<X]/(||xf||;h.||xni+<|0T) 

i=l 

<EEAiixni;h,-iixni+w:,i0T), (47) 

i=i j=i 

where the last two inequalities follow from the chain rule of mutual information and the fact 
that conditioning reduces entropy. In order to get an upper bound on /(||xf ||; hjj||xf || + 
w[j\(f)J), we need to maximize this mutual information with the average power constraint 
^SNR and the constraint specified by If we relax the latter constraint then the mutual 
information is that of a single-input, single-output i.i.d Rayleigh fading channel with average 
power constraint ^SNR. From [TB^, we know that this mutual information is maximized by 
an on-off distribution of the form 

w.p. 1 — C 

for Vz G {1, ■ ■ ■ ,t} and some C > 0. This signalling scheme becomes increasingly "fiashy" as 
the SNR gets low, i.e., C — > as SNR — 0. Hence, (jTFj) becomes 

/(A; Y|$) 

<EEAI|xfll;h,-pfll+w:,|0T) 

i=l 3=1 

<EE^(pni) 

i=i j=i 

~ '^i^Clog(^), 

where the approximation is valid since we are in the low signal to noise ratio regime and 
>OasSNR— i>0. Therefore, we have 

-/(A;Y|$)<-Alog(-). (48) 
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However, this on-off distribution minimizes (j21) also and hence the extra constraint does 
not change the optimal input. Therefore, it suffices to consider on-off signals. Hence, © 
becomes 



(r + t), 



SNR^+" + 0(SNR^+"+^) 



2t 



>^log(l + ^SNR) 



> 



loerf— ) — lofff , 



(49) 



where the approximation is valid since ^^7^ ^ 1 as SNR 0, i.e. the peak amplitude 
becomes very large as the signal to noise ratio tends to 0. Combining (|48|) and (|49|) . we have 



^/(A;Y|$) < !l!-i^SNRi+" + 0(SNRi+"+^). 



2t 



(50) 



Upper bound for /($; Y|A): 

We can upper bound /($; Y| A) in terms of the mutual information of a single-input, single- 
output channel, i.e. 

t r 

/($; Y\A) <Y,Y. 5 yJl A' <Af , • • • , <^-i, <^+i, • • • , yf, • • • , yJ-1, yf+1, • • • , y^- 
i=i j=i 

The term inside the double summation represents the mutual information of the channel 
between the i*'^ transmit antenna and receive antenna when no other antenna is present 
and the norm of 5cf is known at the receiver. Since the input vectors are identically dis- 
tributed and the channel matrix has i.i.d. entries, the mutual information between any pair 
of transmit and receive antennas given that the other antennas are absent will be the same. 
Hence for alH G {1, . . . , t} and j G {1, . . . , r}, 

/($; Y| A) < rtli^ ; yf | A, . . . , cgL,, c^^,, . . . , yf , . . . , yj_,, yj^,, . . . , y^). 

We may thus consider the single-input, single-output channel between the i*'* transmit an- 
tenna and j*'* receive antenna: 

yj = hijilxfll + wj. 

Hence, 



/(*;Y|A) 



< rtJ(xf ;yj|||. 
= rtEI{5lJ;yJ\ 



(51) 
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Since /(xf ; yj| ||) is the mutual information of a single-input, single-output channel over 

/ channel uses, it has a power constraint of -^-f^- This mutual information can be upper 
bounded by the capacity of AWGN channel with the same power constraint, i.e. 

/(xf;yJIPni)<nog(l + M). (52) 
Combining 1)511) with ()52)1 . we obtain 

/($;Y|A) <rt/Elog(l + ^^) (53) 
From (fSHj). (fSUj) and (fS^. we obtain our upper bound to /(X; Y) as 

y/(X; Y) < rt^log(l + ^|^) + !l!^SNR^+" + 0(SNR^+"+^). (54) 

Combining ()54|) with and noting that all the input vectors have identically distributed 
norms, we see that the input distribution satisfying (jH)) satisfies the second constraint (jSI) 
also. This completes the proof of the lemma. □ 



Appendix 2 

Proof of Theorem [T] 

For any a G (0, 1] and 7 G (0, a), let 

C(SNR) > C*(SNR) = rSm - ^^^^^^ SNR^+" + 0(SNR1+°+^). 

This implies that there exists a probability distribution on X such that 

/(X; Y) > C*(SNR). 

From Lemma d we know that this distribution must satisfy the following constraints for all 
2G{l,...,t}: 



logfl 



tE 



log 1 



I -'Tii2\ 



I 



- ■^^-^SNR^+" + 0(SNR^+"+^), 

> siMR_ i!l±^SNR^+" + 0(SNR^+°+^) 



(55) 
(56) 



Using these constraints, we establish a necessary condition on the coherence length. As the 
norms of the transmitted signals are identically distributed, it suffices to consider only one 
of them. Therefore, we will omit the subscript, i, and define random variable b as 



Til 2 



t||x 

Ism 
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The two constraints become 



tE 



log [1 - 
log fl 



b/SNR 

t 

bSNR 

t 



Moreover, as 



2t 

> SNR 



E[h] 



(r + t) 



SNRi+° + 0(SNR^+"+^). 



/SNR 



1. 



(57) 
(58) 



(59) 
(60) 



Note that ()58I60|) do not depend on the coherence length, /, whereas ()57|) does. Also, the 
left hand side of (jFTj) is a monotonically decreasing function of /. Thus, to find how large the 
coherence length necessarily needs to be, we need to find the distribution on b that minimizes 
the left hand side of ^7\i subject to the constraints (j58l6Up . Using this distribution for b, 
we can obtain the necessary condition on the coherence length from (j37j) . 
For any /5 > 0, we can express as 



SNR (r + t), 



SNR^+" + 0(SNR^+"+^) 



t 

< E 



log (l 



bSNRy 

t ). 



Pr(b > tSNR-^)E[log fl + 



bSNR 

t 



b > tSNR-^ 

logfl 



< Pr(b > tSMR~^)E 
SNR 



log 1 + 



+ Pr(b < tSNR-^)E 
bSNR 



bSNR\ 

t ) 



t 



-Prfb > tSNR~^)E 



t 

bSNR 

t 



b>tSNR"'^ +Pr(b < tSNR"^)E 
bSNR 



b < tSNR-'^ 
bSNR 



t 



b < tSNR"'^ 



- log 1 



t 



Therefore, 

Pr(b > tSNR-^)E 



bSNR 

t 



log 1 + 



bSNR 

t 



b > tSNR- 



(61) 



< l!li^SNRi+" + 0(SNRi+"+^) 



When (3 >l,h> tSMR~^ implies ^ > 1 which makes ^ > log(l + ^). Hence, 
V/3 > 1 



Pr(b > tSNR-^)£;[b|b > tSNR-'^] < 
From Markov's inequality, V/3 > 1 



(r + t) 



SNR" + 0(SNR"+'^) = o(l) 



SNR^ 

Pr(b > tSNR-^) < = o(l). 

t 



(62) 



(63) 
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When /? < 1, b < tSNR-^ implies ^ < 1. Hence, dHD) can be expressed as 



(r + t), 



SNR^+" + 0(SNR 



t 



> Pr(b > tSNR-^)E 

> Pr (^tSNR-^ > b > tSNR^'^jE 

> Pr f tSNR"^ > b > tSNR^^'lE 



b > tSMR-'^ 



bSNR / bSNRx 

- log [1 + 

bSNR , / bSNR\ 

- log ( 1 + ^— j 



t V t 

l/bSNR\2 i/bSNR\3 



2 V t 



3 V t 



tSm-^ > b > tSNR^^ 



> Pr (tSNR-^ > b > tSNR-f^ 



isNR2a-/3) _ 1snr3(i-^) 

2 3 



Thus, V/5 e (0, 1) 



Pr (tSNR-^ > b > tSNR"^) < 



2(r + t) SNR2^-(^-") /SNR2^-(^-") 



t2 1-|SNR^-^ Vi-ISNRi-'' 



(64) 



Let us divide the interval [tSNR ^,tSNR ^], (3 G (0,1), into K > 1 finite intervals so that 
each interval is of length 



£ — 



i(SNR-^ -SNR-^) 
K 



Now, for any e > 

E^h\tSNR-^ >h> tSNR-f^ Pr (tSNR-^ >b> tSNR"^' 

K 

= ^E[b|tSNR-(^+*") > b > tSNR-['^+(^~^)"] Pr (^tSNR-('^+^^) > b > tSNR-['3+(*-^)"]) 
1=1 

K 

< t J]SNR-(^+^^) Pr (^tSNR-('^+^^) > b > tSNR-[^+(*-^)"l) 

i=l 
K 

< t^SNR-(^+^^) Pr (^tSNR-^ > b > tSNR-I^+(*-^)"l) 



K 



t 



2{r + t) 



1=1 

K 



S[\|R[2{/3+(*-l)e)-{l- 



1 _ 25|\|p[l-(/3+{i-l)e)] Vl _ ^S|\|R[l-(/3+{i-l)e)]A 



S|\|R[2(/3+(i-l)£)-(l-Q)] 



(65) 



S|\|R[^-(l-")+(*-2)e] 



+ O 



S|\|R[/3-(l-<^)+{i-2)£] 



t ^ Li _ |SNR[^-(^+(*-i)^)l " ^Vi _ |SNR[i-(^+(^-i)^) 
1=1 i i 

Equation (j6l|l is used to obtain (j65|) . Let (3 = 1 — a + 2e, where e G (0, f ). Thus, 

E[b|tSNR-^ > b > tSNR-^] Pr(tSNR-^ > b > tSNR"'^) 

SNR^" / SNR*" 



K 



^ 2(r + t) ^ I 

1=1 3 



+ 



1 - f SNR°-(*+^)" 
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Since, 



SNR" 



1 - |SNR"-(*+i)" 



0(1) ytE{l,...,K}, 



we have 
E 



b|tSNR-^ > b > tSNR-(^-"+2") Pr (tSNR-^ > b > tSNR^^^-^+^e)^ < ^^^^ ^gg^ 

From (jnH), we have for /5 = 1 — a + 2e, where e G (0, 

2(r + t) SNR^-"+'" 



2 



Pr (tSm-^ > b > tSNR-/^) < 



t2 1 - |SNR"-2s 



+ o 



SNR 



l-Q+4e 



^SNR' 



o(l). (67) 



From (jn2inninninZ|) , we know that for < e < a 

Pr(b > tSNR-(^-"+^)) = o(l), 
Pr(b > tSNR-(^-"+^))E[b|b > tSNR-(^-"+^'] = o(l), 



which imphes 



Pr(b < tSNR-(^-"+^)) =0(1), 
Pr(b < tSNR-(^-°+^))^[b|b < tSNR~(^~"+^)] = 0(1). 



Hence, the distribution on b that minimizes E 
()58|60p . is the on-off distribution: 

b = 

where. 



loK 1 



biSNR 



subject to the constraints 



tSNR-(^-"+^) w.p. r] 
w.p. 1 — 1] 



SNR 



l-a+e 



Hence, with this on-off distribution on b, ()57|) becomes 



jr + t) 
2t 



SNR^+" + 0(SNR^+"+^) > "^^^ — - log (l + /SNR"-^) . 



(68) 



Now, 



SNR 



log (l + ZSNR" 

SNR^-^"log(l + -^SNR 



> 



{r + tf 

-n^^(r + t)^ 
^^-^SNR^+" + 0(SNR^+"+^). 
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Thus, with / = ■^^:^^SNR ^"^^ (jUH)) is not satisfied. However, the right hand side of (|UH)) is 
a monotonically decreasing function of /. Hence, for the constraint in (jUHj) to be met 

/ > r:rSNR-2°+^ Ve e (0, a). 

Thus, we see that if an input distribution satisfies ()55I56I59|) . then the coherence length must 
necessarily obey 

(r + ty 

This completes the proof of the theorem. □ 



Appendix 3 



We compute the capacity of an i.i.d Rayleigh fading MIMO channel when CSI is unavailable 
at both the transmitter as well as the receiver. It is shown in jH] that increasing the number 
of transmit antennas beyond the coherence length does not increase capacity. Hence, from a 
capacity point of view, it suffices to use only one transmit antenna {t = 1). We will therefore 
consider the capacity of a single-input, multiple-output (SIMO) channel. 
Let us pick on-off signaling to communicate over the channel. This signaling scheme is later 
proved to be optimal for the i.i.d Rayleigh fading MIMO channel. We specify the signaling 



as 



A w.p. ijj 
w-p. \ — io 



(69) 



where, A G 3?^^ and lo = With this signaling, we have the following probability distri- 
butions 



Py|x=o(2/) 



TT' 



exp{-\\y\ 



exp 



7r(l + A)'' V 1 + A. 
The mutual information /(x, y) can be written as /(x, y) = /i(y) — h{y\x.). Now, 

My) 

= - / Py(f)log(Py(y))rfy 



log 



-UJ 



1 - ^)Py\^=o{y) + ^p^\^=^{y) 

Pf\^=o log (1 - a;)pj?|x=o(y) (1 + Y 

L I — UJ Pj;|x=0 



1 - w)Py|x=o(y) + ^Py|x=v^( 



Py|x=0 

U P^\y:=^{l 



dy 



40 



—UJ 



- log(l - a;) - (1 - a;) / Pf\^=Q{y) log ^1 + 



UJ 



Py|x=V3(^) log 



1 + 



l-uj Psf\^=o(y} J 
dy + /i(y|x) + a;D(p-|^^^||pj;|x=o) 



The divergence |pj;|x=o) is the divergence between two Gaussian random vectors 

and is therefore 

^(Py|x=V3lby|x=o) = riA - log(l + A)) 
The expression for the mutual information becomes 

/(x, y) = rSm - rSNR ^"^^\^^^ - log(l - ^) - /(SNR, ^) 



where, 



7(SNR,^) = 7i(SNR,A) + 72(SNR,^), 



(70) 



/i(SNR,^) 
/2(SNR,^) 



,1-uj) / 



•x=0^ 



log 



1 + 



UJ 



Py|x=VA(^)l 



Pj?|x=Va(^) log 



1 + 



1-UJ Py|x=0 



1-UJ Pj?|x=o(y) 

At low SNR, >1 takes very high values, therefore the mutual information can be written as 



7(x, y) = rSm - rSNR ^"^^^^ - 7(SNR, A) 
We will now compute /i(SNR, A) and /2(SNR, A). Let us define to be such that 



(71) 



SNR 



exp 



A{l + Ay '^Vl + ^ 



A<;* 



1. 



Note that 



Thus, 



_log(A) ^ ^ log(l + A) ^ logisk) 



1 + A 



lim 



A-^oo 1 + A 

We will use this in future derivations. 
Computing 7i(SNR,A): 
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Converting to spherical coordinates in 2r dimensions, we have for large A: 

h{sm,A) 



1 


(r — 




exp( 




(r- 


1)! 


exp( 




(r- 


1)! 



exp(-^) log 



SNR / Aq 



(^ + ^*)""^exp(-^)log 
= /ii(SNR, A) + /i2(SNR, A) + o(SNR'), 



1 + exp 



l + AJi 



where, 



7n(SNR,A) 
/i2(SNR,A) 



exp(-<^* 
(r- 1)! 
exp(-c 



(? + ?*r-'exp(-^)log 
(<; + <;*Y~^ ex.p(-<;) log 



1 + exp 
1 + exp 



A^ 
1 + AJi 

A<i 
1 + A 



(r-iy. Jo 

Computing /ii(SNR,A): 
7n(SNR,A) 

/_°^ + e^P(-0 log [l + exp {^)]d<^ 



< 



exp(- 


-<^*) 


(r- 


1)! 


exp(- 


-^*) 


(r- 


1)! 


exp(- 


-^*) 



^j<^ + <^*r^exp(-,)exp(^ 



dq 



exp(- 

cxp(- 



(<? + <?*)'■ exp( 



-)d<; 



{1 + AY ^ A^* ^ ^r-l 

(r-1)! 

(r - 1)! 1 + A 

/K(SNR,A), 



1 + 

Ac 



1 + A 

) / c'-iexp(-c)rfc 



)[r(r)-r(r,^)j 



where. 



7{;(SNR,A) = ^-(1 + Ay 



SNR 



A(l + A)' 



j=0 



Moreover, 

/ii(SNR,A) 

exp(-c* 



(r-l)! 



(c + C*r~'exp(-c)log 



1 + exp 



1 + Ayj 



dc 
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> /[i(SNR,A) 
= /fi(SNR,A) 
= /fi(SNR,A) 

= /[;(SNR,^) 
= 7fi(SNR,>l) 

= 7K(SNR,A) 

-/fi(SNR,A) 



2(r-l)! J S"" exp(-Oexp 
exp(-^*^ 



2(r- 1)! 
exp(-? 
2(r- 1 

1 

2(r - 1)! 
2(r- 1)! 

2 

(-ir-i 



_*\r— 1 



+ ^ exp ( -p— )(i<j 



A- 1 



\A + 1 







-1 


^0 






exp 


(- 








exp 


(- 








\A + 




r 


VA- 


iJ 




\A + 


li 


r 


VA- 


iJ 





A- 1 

A + l 



2Aq* 
A+l 



A+l 
A-l 



exp(^)d^ 



2A^* \ + 1" 



exp(-,*) J] 
SNR 



'■-1 \-AilLc*M 



— exp 



A{i + Ay\ 



1 r-1 \_A^^*]j 



l+A 



j=0 



2Aq* 
A+l 



o(SNR2 



Let 



7^1(SNR,A) = 



(-1)-^ 


rA + 


li 


r 


[ SNR ] 


2 


U- 


iJ 




[a{i + ay\ 



1+ 



1- r '■-1 r A-l *17_, 



l+A 



j=0 



A+r 



Thus, we have 



7fi(SNR,A) -7fi(SNR,A) + o(SNR^) < 7ii(SNR,A) < 7[;(SNR,^) 



u , 



Since tt-t ^ as A — > oo, we have 



l+A 



hm 7{;(SNR,A) = 0, 

A^oo 



hm 7fi(SNR,A) = 0, 

A^oo 

hm 7ii(SNR,>l) = 0. 

A— »oo 



Computing 7i2(SNR, A): 

/i2(SNR,A) 
_ exp(-^*) 
- (r-1)! 
_ exp(-<j*) 



<j + exp (-<j) log 



_*\r— 1 



(r-1)! 
7iySNR,A) + 722(SNR,A), 



cxp(-^) 



l + A 



+ log 



1 + exp 



TTl)]]* 



where, 



/i2(SNR,A) 



4(SNR,A) 
exp(-^* 



exp(-^*) 


r ^ 1 




(r-1)! 


[i + a\ 


I 



'?('? + '?*)'' 'exp(-c^)rf^. 



(r-1)! Jo 



{<; + <;*Y-'^ exp{-q) log 



1 + exp 



— ) 

1 + 



(73) 



(74) 
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Now, 



il,{sm,A) 

exp(-^*) 



(r-1)! 
1 



r ^ 1 




11 + a\ 


I 



(r-l)! 
_ 1 

1 

" (r- 1)! 
= exp(-^*) 



1 + ^ 



1 + A 



r(r + l,^*)-^T(r,^*) 
r(r,^*)(r-^*) + (^*rexp(-,*)] 



?! - 



r — <j' 



+ 



V-l)! 



I SNR ] 




r ^ 1 




U(l + A)^J 




11 + a\ 





r — q' 



+ 



;r-l)!J 



/i^2'^(SNR,A)SNR^+^, 



where, 



4^(SNR,A) 



r 1 1 




r ^ 1 








[i + a\ 





r-l 



(r-l)!L^ j! 



+ 



[r-iy.l 



Since — > as A — > oo, we have 



Thus, 



hm (SNR,^) = 0. 



hm /iySNR,A) = 0. 

A— »oo 



We wiU now compute (SNR, A). 
4(SNR,/l) 

~^ *^ ' 0^{<; + <;*y-^exp{-<;)log 



< 



exp( 


-^^*) 


(r — 


1)! 


exp( 


-^*) 


(r — 


1)! 


exp( 


-^*) 


(r- 


1)! 


1 




(r- 





l + exp(-^)_ 



+ ^*)'' exp(-^) exp 



1 + A 



1 + Ay ( Aq 
, . exp —— 



<j'' exp(-<j)(i<j 



V 1+A ) 
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where, 



1 



r - 1)! 

- 1+A 
,1 + 2A 

/,Y(SNR,A), 



1 + A v 
1 + 2a\ 



exp 



l + A 

r-l \( 1+2A 
1+A 



1 + A 



[(■ 



/iY(SNR,A) 



r SNR 1 


1+i 




U(i + A)'-J 




Li + 2aJ L 



r-l \(\+2A 



■ i=0 



Moreover, 



4(SNR,A) 

> 4^(SNR,A) 

= /iT(SNR,/l) 

= /iY(SNR,A) 

= /iY(SNR,A) 

= 4''(SNR,A) 

/i22'^(SNR,A) = 



Since — > as A ^ oo. 



where. 



exp(-£*) 

2(r-l)! Jo 
1 

exp 



2(r - 1)! 
1 

2(r - 1)! 

exp(-^*) 
2 

1 

r2L 



2A^ 
1 + A 



*\r— 1 



exp(— ^) exp ( — 



2A<; 
1 + A 



1 + A 
1 + 3aJ 

■ 1 + A r 
.1 + 3aJ 



exp 



2A^ 



j=0 



1 + A 

A 



r(r, 



,1 + 3A 
l + A 



)<^*) 



r-l [(l_t3A)^*l. 



/,t(SNR,A). 



r2 



hm /22^(SNR,A) = 0, 

A— >CXD 



hm /iY(SNR,A) = 0, 



^ hm /^2(SNR, A) = 0. 

A— >oo 

Substituting (f75j) and (f7B|) in (fTIj) . we have 

hm /i2(SNR, A) = 0. 

A^oo 

Substituting (|7H|) and (|77j) in (ff2|) . we obtain 

/i(SNR,A) = o(SNR2). 



r SNR 1 




rl + Ai^r 


U(l + Ay)J 




Li + 3aJ L 



r-l ^(- l+3A v>.],- 



E 

j=o 
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Computing 72(SNR,A): 
SNR 



/2(SNR,A) 



7r''^(l + Ay 



exp(--^)log[l + 



SNR 



■ exp(- 



A 



Converting to spherical coordinates in 2r dimensions, we have for large A: 

hism^A) 



SNR 



r r-l I ^ M , SNR A 



+ l)'-(r - 1) 
= /2i(SNR, A) + /22(SNR, A) + o(SNR2), 



^(1 + ^)' 



1 + A' 



(79) 



where 

72i(SNR,A) = 

/22(SNR,^) = 



SNR 



A(A+l)'(r-l)!yo 
SNR 



,/%-exp(-^)log[l 



SNR /l<^ 



■l + A' 



A{A + iy{r -I) 
Computing /2i(SNR,A): 
/2i(SNR,A) 



- /f ^'-^exp(-^)log[l + ^,,^^exp(:^)]d,. 



A{i + Ay 



■l + A' 



SNR 



< 



A{A + iy{T-iyj, 

SNR /• 

SNR i2 1 



<j'' exp(- 



r-l 

^ exp 



l + A 



)log 



1 + 



SNR ^ 



A{l + Ay ^^^^ VI + ^7 



SNR 



A(l + Ay-J (r-l)!7o 
SNR ^^\A + l 



r—l 

q exp 



1 + AJ A{1 + Ay 
A-1 



exp 



exp 
l + A 



A{i + Ay 

SNR 

A{i + Ay 

SNR 

A{i + Ay 

SNR 



A-1 
^\A + 1 
.A-1 

2rA+ In*- 



A + 



(r-iyjo 

(r - 1) 



^exp(^)d^ 



A- 1 



A + 1 

A-1 



I^^{SNR,A) SNR^+3 +o(SNR') 



(r-l)! 

-ly-^ 

(r-l)! 



r — 



l)!exp(^),*)j; ^--^^^ 



(r-l)! 



'A + 
SNR 



A(l + A)' 



i=o 



-(r-l)!] 

(r-l)! 



where, 



72';(SNR,>1) = 



A + l 



r-l r / A-l 



[A{i + AyY+^ 
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Now, 

/2i(SNR,A) 



> 



SNR 




[ ^exp 
Jo 


A{A + lY{r- 


1)! 


SNR 




^0 


A{A+iy{r- 


1)! 


SNR 




/ ^'■"^ exp 

^0 


2A(A + iy(r- 


-1)! 



SNR 



+ AJ A(l + Ay Vl + ^ 

^ \ SNR / A 



1 + A 



SNR 



■ exp 



/2^i(SNR,.l) SNR'+^ +o(SNR") 



i+- 



SNR 



A{i + Ay 



/aKSNR,^) SNRi+^+o(SNR 



SNR 



IA{1 + Ayi (r- 1)! 



-ly 



A+1 



L2A-1J L 



9 4 — 1 

r(r,-(^^K*)-r(r) 



(SNR,A)]SNR^+^ +o(SNR2 



SNR 



{-ly 



A{l + Ayi (r-l)! 



J (r- 



A+li 



2A-1 



(r-l)! 



SNR , 



/2^i(SNR,^) - -/2'i(SNR,^) SNR^+i +o(SNR 



where, 



iii{sm,A) 



[-ly 



[A{i + Ay]'+^ 



A + ir^[-(^K*p- 



2^ 



Combining the lower and upper bounds, we have 
/2^i(SNR,^) - ^/2^i(SNR,^)lSNR^+^ + o(SNR') 



</2i(SNR,^) < /2^i(SNR,A) SNR'+^ + o(SNR 



>i+4 



At low SNR, 



1+A 



as A — > oo. Thus, we have 

21 
rL 



lim /2^i(SNR,^) = 

A-*oo 



lim 72i(SNR,^) =0 

A— »oo 



Therefore, 



/2i(SNR,A) =o(SNR2). 



(80) 
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Computing 722(SNR,A): 
/22(SNR,^) 



SNR 



A{A + iy{r-l)\ 



SNR ii+i 



(r- 1)! IA{1 + A) 
1 



SNR / Aq 

1 + exp ^ 



l + A 



(r- 1)! LA(1 + A) 



<j + <j ' exp ( - ^ 1 log 



SNR ii+i 



(r- 1)! LA(1 + A)' 



]'"A-r'-(-TT7)[Tfl-'-h-(-Tfl)] 



(r- 1)! L>1(1 + >1)' 



"SNRi+i 



l + A 

4(SNR,A) + 4(SNR,A) 



(81) 



where, 
4(SNR,A) 
4(SNR,A) 

Now, 

/22(SNR,A) 



l + exp(-^)]ci,. 



^(^ + ^*)''"^exp 



l + A 



(1 + Ay+^ 



(1 + ^) 

1 

(1 + 



1 + >1^J 



exp 



l + A 



r(l + Ay+' -,*{! + AY\T{r, ^) + (1 + A){,*Y exp 



i + ^yj 



l + A 



/2^2^(r,SNR,A), 



where. 



.-1 f^p- 



4^(r,SNR,A)=r![^li±^ 



r(l + A)J 



l + A, 



Since — >^ as A — > oo. 



Urn 72^2 (^, SNR, A) = r! 

A— »oo 



(82) 
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Thus, 

We now compute /|2(SNR, A). 
4(SNR,>1) 



Ar\ 

^Ta' 



(83) 



t 



l + A 



log 



1 + exp ( - 



V \ + a)\ 



l + A> 

dt 



dt 



A{1 + 
1 



1 

^exp(f) J^^ f-'eM-t)dt 



A{i + Ay 

(r-1)! 



■exp(r)r(r, t* 



r-1 



a(i + ay2-. ji ■ 

Moreover, 
4(SNR,^) 



t 



)^"K-Tfl) 



l + A. 
t 

l + A 



exp 
exp 



2At 
l + A 



dt 



dt 



A{l + Ayj^^ j\ J 2A{l + AyJo 
(r-1)! g(t*)^- 



(t + r)'-iexp( -iii^^ldi 



1 + ^ 



+ j! 

(r-1)! y^(r)^- 
A(l + A)'^ ^ 



j=0 



J! 



2^(1 + ^)' 

(r-1)! 
2A{1 + Ay 



exp 



/ (l + 2A)t* >. 
V l + A J 



l + A 
1 + 2A, 



^ (i + 2A)r 

^' l+A ^ 



1 + 2AJ ^ ?! 

i=o 



Now as A — > oo, ^ 0. Therefore, 



hm -j^, \— 

A-^oo A(i + Ay 



r-1 



j=0 



hm 

A— ioo 



(r-1)! '^it*y 
,A(1 + A)'-^ j\ J 2A(1 + A) 



r-1)! ri + -4rg [(^)^t 



1 + 2AJ ^ 



J! 
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As both the upper and lower bounds go to 0, we have 

hm JgySNR, A) = 0. 



Substituting 



and dHU) in (jHH), we have 

i22ism,A) 



1 


r 1 1 




1 

A 


- Ar\ - 


(r- 1)! 


[a{i + ay\ 




11 + a\ 



+ o(SNR2 



Therefore, 



/2(SNR, A) = rA-^SNR^+i + o(SNR2). (85) 
Substituting (jZHI) and (jHSI) in (HH) and (jZOl), we obtain 

J(x, y) = rSNR - rSNR ^"^^^^^^ - rA-^SNR^+^ + o(SNR2). 

Let the capacity of the channel be C(SNR). Since, on-off signaling may not be optimal for 
the channel, we will denote the highest achievable rate using on-off signaling as Con-off (SNR). 
Con-off(SNR), is given by 

Con-off(SNR) 

= max/(x, y) 



rSNR[l - M*(SNR)] + o(SNR^), 



where. 



M*(SNR) 



mm 

A 



mm 

A 



A 

log(A) 



A 



r4-l 1 

A-^SNRa 



(86) 



log(y4) , '■+i^.,r^l 

A 



The last equality holds since A is large. Let us denote 

M(A,SNR) = nun 

We will prove the following theorem to get a lower bound on M*(SNR). 
Theorem 5 

A loglog(sNR) 



M*(SNR) > Mi(SNR) 



log( 



(87) 
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Proof: We will prove this by contradiction. Let there be an Ai such that the theorem does 
not hold. Since ^^^T^ > and A~~SNR^ > 0, we have, 



< M,(SNR), 

Ai 



r+l 
A 



If (EHl) holds, we have 



SNR^ < Ml(SNR). 



Ai > log( 



SNR'' 



Moreover, 



A^ SNR^ 



r + l 
^1 



> ^1 ' 



exp( 



SNRiir 
r - 

(r + l)log(Ai), 
Ai 



SNRi 



>exp[-(r + l)ML(SNR)]e-^ 

As SNR 0, we have 

exp[-(r + l)Mi(SNR)]e-i > Ml(SNR), 
A^^SNR^ > Ml(SNR). 

This contradicts ()89j) . which completes the proof. 

To get an upper bound for M*(SNR), we pick a value of A. Let 



A, 



log log( 



SNR^ 



Now, 



M*(SNR) < 



log(^2 



+ A, ^2 SNR" 



We have 



log(A2 



Ao 



[log log( 



SNR' 



log log log (5^)] log log 



< 



[loglog(sM)]' 



(89) 



□ 



(90) 



(91) 
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and, 



r + l 



SNR^ 



(92) 



< 



< 



< 



max 



1 


rSNRi 


1 




. r . 





1 


rSNRi 




. r - 



6^2 log(A2) 

SNRi^ 



^rSNRi^fe 



r J 

SNR 

r J 
1 



log{^) 



Equation ()93p holds since A2 » 1 for SNR 0, which makes 



(93) 



(94) 



Combining ()9();91f94|l . we have 

M*(SNR) < 



leA2 \og{A2)\ 

[loglog(3^)]2 + l 



lo£ 



(95) 



Finally, using (jHUjl . Theorem 1 and we have 
rSNR - rSNRl!^^M#]!±i + ,(SNR^) 



log( 



SNR' 



< an-off(SNR) < rSNR - rSNR 



loglog(c^ 



o(SNR2). (96) 



Since on-off signaling may not be optimal 

Con-off(SNR) < C(SNR). (97) 
As conditioning reduces entropy, we can express the input-output mutual information as 



/(x,y)<5^/(x,y,). 



(9^ 



k=l 



Each term on the right hand side of (j98|) is maximized by an on-off distribution , and we 
know from (2^1 that with this distribution, the mutual information V/c e {1 . . . r} is 



J(x,yfc) < SNR -SNR 



!^^1#m)+o(snr^). 
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Hence, we can upper bound the capacity as 



Since, 



C(SNR) < rSNR - rSNR ^"^^"^^.^^^^^ + o{Sm^) 



we have 



1 1 /' ^ ^ 

C(SNR) < rSNR - rSNR "^\snrJ ^ o{Sm^). (99) 
Combining (jSHl EH EHI) , we obtain 

rSNR - rSNRfl^^i^^^|%^l^ + o{Sm^) < C(SNR) < rSNR - rSNRi^^MlMl + o(SNR2](100) 



We now introduce a notation for the approximation that ignores higher order logarithm 
functions. Let /(SNR) and (^(SNR) be functions of SNR. We will denote 

f{sm) = g{sm), 

if 

i„„ = 1. 

SNR^o log5f(SNR) 

With this scaling, the inequalities in (|1UU|) become equalities and the capacity can be ex- 
pressed as 

C(SNR) = rSNR - A.;i;J(SNR). 

where, 

A[-:d^(SNR) 



it,r),^,.^. . rsm 



log( 



SNR' 



Moreover, we also see that on-off signaling (j69p is capacity achieving for the i.i.d Rayleigh 
fading MIMO channel in the wideband regime. (Keeping in mind our scaling, which ignores 
higher order logarithm functions.) 
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